\section{Conclusion} \label{sec:conclusion} In this paper we have compared three
strategies in using reinforcement learning to learn to play Othello: learning by
self-play, learning by playing against a fixed opponent and learning by playing
against a fixed opponent while paying attetntion to the opponent's moves as
well. We found that it differs per algorithm what the best strategy is to train:
Q-learning and Sarsa obtain the highest performance when training against the
same opponent as which they are tested against (while \textit{not} learning from
the opponent's moves) while TD-learning learns best from self-play. Differences
in the level of the training opponent seem to be reflected in the eventual
performance of the training algorithms.

Future work might take a closer look at the influence of the training opponent's
play style on the play style of the learning agent. In our research, the styles
of \textsc{heur} and \textsc{bench} were similar, and the differences in
eventual performance was only analyzed in terms of a score. It would be
interesting to experiment with fixed opponents with more diverse strategies and
analyze the way these strategies influence the eventual play style of the
learning agent in a more qualitative fashion.


